While he is responsible for the largest outlier in this period, when outlier values are eliminated from each dataset, Erik-written articles receive fewer Page Views or Facebook Shares than the rest of the content.
While both Erik and All contents' 1st quartile are quite similar, Erik's articles' performance are clustered at the lower-end of the traffic scale.
In [102]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [103]:
df_all = pd.read_csv('All content.csv')
In [104]:
df_erik = pd.read_csv('Erik content.csv')
In [105]:
df_all = df_all[(df_all.Published > df_erik.at[6,'Published']) &
(df_all['Url'].str.contains('/articles/')) & (df_all.Published < df_erik.at[16,'Published']) &
(df_all.Title[~df_all.Title.isin(df_erik.Title)])]
df_erik = df_erik[(df_erik.Published < df_erik.at[16,'Published'])]
In [106]:
print 'All PVs median'
print df_all['Page Views'].median()
print 'Erik PVs median'
print df_erik['Page Views'].median()
In [107]:
df_all['Page Views'].describe()
Out[107]:
In [108]:
df_erik['Page Views'].describe()
Out[108]:
In [110]:
d = {'erik':df_erik['Page Views'],'all':df_all['Page Views']}
df = pd.DataFrame(data=d)
df.plot(kind='box',showfliers=False, title = "Page Views Distribution")
Out[110]:
In [111]:
shares = {'erik shares':df_erik['Facebook Shares'],'all_shares':df_all['Facebook Shares']}
df_shares = pd.DataFrame(data=shares)
df_shares.plot(kind='box',showfliers=False, title = "Facebook Shares Distribution")
Out[111]:
In [112]:
d = {'erik':df_erik['Page Views'],'all':df_all['Page Views']}
df = pd.DataFrame(data=d)
df.plot(kind='box',showfliers=True, title = "Page Views Distribution")
Out[112]:
In [113]:
shares = {'erik shares':df_erik['Facebook Shares'],'all_shares':df_all['Facebook Shares']}
df_shares = pd.DataFrame(data=shares)
df_shares.plot(kind='box',showfliers=True, title = "Facebook Shares Distribution")
Out[113]:
In [ ]: